raman spectra
Rapid Machine Learning-Driven Detection of Pesticides and Dyes Using Raman Spectroscopy
Binh, Quach Thi Thai, Phuoc, Thuan, Hai, Xuan, Phan, Thang Bach, Thu, Vu Thi Hanh, Hung, Nguyen Tuan
The extensive use of pesticides and synthetic dyes poses critical threats to food safety, human health, and environmental sustainability, necessitating rapid and reliable detection methods. Raman spectroscopy offers molecularly specific fingerprints but suffers from spectral noise, fluorescence background, and band overlap, limiting its real-world applicability. Here, we propose a deep learning framework based on ResNet-18 feature extraction, combined with advanced classifiers, including XGBoost, SVM, and their hybrid integration, to detect pesticides and dyes from Raman spectroscopy, called MLRaman. The MLRaman with the CNN-XGBoost model achieved a predictive accuracy of 97.4% and a perfect AUC of 1.0, while it with the CNN-SVM model provided competitive results with robust class-wise discrimination. Dimensionality reduction analyses (PCA, t-SNE, UMAP) confirmed the separability of Raman embeddings across 10 analytes, including 7 pesticides and 3 dyes. Finally, we developed a user-friendly Streamlit application for real-time prediction, which successfully identified unseen Raman spectra from our independent experiments and also literature sources, underscoring strong generalization capacity. This study establishes a scalable, practical MLRaman model for multi-residue contaminant monitoring, with significant potential for deployment in food safety and environmental surveillance.
- Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.05)
- Asia > Vietnam > Thái Bình Province > Thái Bình (0.04)
- Asia > Vietnam > Bình Thuận Province (0.04)
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- Materials > Chemicals > Agricultural Chemicals (1.00)
- Health & Medicine (1.00)
- Food & Agriculture > Agriculture > Pest Control (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Uncertainty-aware Physics-informed Neural Networks for Robust CARS-to-Raman Signal Reconstruction
Venkataramanan, Aishwarya, Vemuri, Sai Karthikeya, Valapil, Adithya Ashok Chalain, Denzler, Joachim
Coherent anti-Stokes Raman scattering (CARS) spectroscopy is a powerful and rapid technique widely used in medicine, material science, and chemical analyses. However, its effectiveness is hindered by the presence of a non-resonant background that interferes with and distorts the true Raman signal. Deep learning methods have been employed to reconstruct the true Raman spectrum from measured CARS data using labeled datasets. A more recent development integrates the domain knowledge of Kramers-Kronig relationships and smoothness constraints in the form of physics-informed loss functions. However, these deterministic models lack the ability to quantify uncertainty, an essential feature for reliable deployment in high-stakes scientific and biomedical applications. In this work, we evaluate and compare various uncertainty quantification (UQ) techniques within the context of CARS-to-Raman signal reconstruction. Furthermore, we demonstrate that incorporating physics-informed constraints into these models improves their calibration, offering a promising path toward more trustworthy CARS data analysis.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Mexico > Gulf of Mexico (0.04)
- Europe > Portugal > Braga > Braga (0.04)
- Europe > Germany (0.04)
RamPINN: Recovering Raman Spectra From Coherent Anti-Stokes Spectra Using Embedded Physics
Vemuri, Sai Karthikeya, Valapil, Adithya Ashok Chalain, Büchner, Tim, Denzler, Joachim
Transferring the recent advancements in deep learning into scientific disciplines is hindered by the lack of the required large-scale datasets for training. We argue that in these knowledge-rich domains, the established body of scientific theory provides reliable inductive biases in the form of governing physical laws. We address the ill-posed inverse problem of recovering Raman spectra from noisy Coherent Anti-Stokes Raman Scattering (CARS) measurements, as the true Raman signal here is suppressed by a dominating non-resonant background. We propose RamPINN, a model that learns to recover Raman spectra from given CARS spectra. Our core methodological contribution is a physics-informed neural network that utilizes a dual-decoder architecture to disentangle resonant and non-resonant signals. This is done by enforcing the Kramers-Kronig causality relations via a differentiable Hilbert transform loss on the resonant and a smoothness prior on the non-resonant part of the signal. Trained entirely on synthetic data, RamPINN demonstrates strong zero-shot generalization to real-world experimental data, explicitly closing this gap and significantly outperforming existing baselines. Furthermore, we show that training with these physics-based losses alone, without access to any ground-truth Raman spectra, still yields competitive results. This work highlights a broader concept: formal scientific rules can act as a potent inductive bias, enabling robust, self-supervised learning in data-limited scientific domains.
- Europe > Switzerland > Basel-City > Basel (0.04)
- Europe > Portugal > Braga > Braga (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Health & Medicine (0.93)
- Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.47)
Reevaluating Convolutional Neural Networks for Spectral Analysis: A Focus on Raman Spectroscopy
Soysal, Deniz, García-Andrade, Xabier, Rodriguez, Laura E., Sobron, Pablo, Barge, Laura M., Detry, Renaud
Autonomous Raman instruments on Mars rovers, deep-sea landers, and field robots must interpret raw spectra distorted by fluorescence baselines, peak shifts, and limited ground-truth labels. Using curated subsets of the RRUFF database, we evaluate one-dimensional convolutional neural networks (CNNs) and report four advances: (i) Baseline-independent classification: compact CNNs surpass $k$-nearest-neighbors and support-vector machines on handcrafted features, removing background-correction and peak-picking stages while ensuring reproducibility through released data splits and scripts. (ii) Pooling-controlled robustness: tuning a single pooling parameter accommodates Raman shifts up to $30 \,\mathrm{cm}^{-1}$, balancing translational invariance with spectral resolution. (iii) Label-efficient learning: semi-supervised generative adversarial networks and contrastive pretraining raise accuracy by up to $11\%$ with only $10\%$ labels, valuable for autonomous deployments with scarce annotation. (iv) Constant-time adaptation: freezing the CNN backbone and retraining only the softmax layer transfers models to unseen minerals at $\mathcal{O}(1)$ cost, outperforming Siamese networks on resource-limited processors. This workflow, which involves training on raw spectra, tuning pooling, adding semi-supervision when labels are scarce, and fine-tuning lightly for new targets, provides a practical path toward robust, low-footprint Raman classification in autonomous exploration.
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)
- North America > United States > California (0.04)
- North America > United States > Texas > Harris County > Houston (0.04)
- (2 more...)
- Energy (0.93)
- Health & Medicine > Therapeutic Area (0.46)
Cross-Modal Characterization of Thin Film MoS$_2$ Using Generative Models
Moses, Isaiah A., Chen, Chen, Redwing, Joan M., Reinhart, Wesley F.
The growth and characterization of materials using empirical optimization typically requires a significant amount of expert time, experience, and resources. Several complementary characterization methods are routinely performed to determine the quality and properties of a grown sample. Machine learning (ML) can support the conventional approaches by using historical data to guide and provide speed and efficiency to the growth and characterization of materials. Specifically, ML can provide quantitative information from characterization data that is typically obtained from a different modality. In this study, we have investigated the feasibility of projecting the quantitative metric from microscopy measurements, such as atomic force microscopy (AFM), using data obtained from spectroscopy measurements, like Raman spectroscopy. Generative models were also trained to generate the full and specific features of the Raman and photoluminescence spectra from each other and the AFM images of the thin film MoS$_2$. The results are promising and have provided a foundational guide for the use of ML for the cross-modal characterization of materials for their accelerated, efficient, and cost-effective discovery.
- North America > United States > Pennsylvania > Centre County > University Park (0.04)
- Asia > China (0.04)
- Health & Medicine (0.70)
- Materials > Chemicals (0.67)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.93)
Deep-Learning Investigation of Vibrational Raman Spectra for Plant-Stress Analysis
Patil, Anoop C., Sng, Benny Jian Rong, Chang, Yu-Wei, Pereira, Joana B., Nam-Hai, Chua, Sarojam, Rajani, Singh, Gajendra Pratap, Jang, In-Cheol, Volpe, Giovanni
Detecting stress in plants is crucial for both open-farm and controlled-environment agriculture. Biomolecules within plants serve as key stress indicators, offering vital markers for continuous health monitoring and early disease detection. Raman spectroscopy provides a powerful, non-invasive means to quantify these biomolecules through their molecular vibrational signatures. However, traditional Raman analysis relies on customized data-processing workflows that require fluorescence background removal and prior identification of Raman peaks of interest-introducing potential biases and inconsistencies. Here, we introduce DIVA (Deep-learning-based Investigation of Vibrational Raman spectra for plant-stress Analysis), a fully automated workflow based on a variational autoencoder. Unlike conventional approaches, DIVA processes native Raman spectra-including fluorescence backgrounds-without manual preprocessing, identifying and quantifying significant spectral features in an unbiased manner. We applied DIVA to detect a range of plant stresses, including abiotic (shading, high light intensity, high temperature) and biotic stressors (bacterial infections). By integrating deep learning with vibrational spectroscopy, DIVA paves the way for AI-driven plant health assessment, fostering more resilient and sustainable agricultural practices.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Sweden > Vaestra Goetaland > Gothenburg (0.05)
- Asia > Singapore > Central Region > Singapore (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Workflow (0.86)
- Health & Medicine > Consumer Health (1.00)
- Food & Agriculture > Agriculture (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.34)
- Health & Medicine > Therapeutic Area > Immunology (0.34)
A Self-supervised Learning Method for Raman Spectroscopy based on Masked Autoencoders
Ren, Pengju, Zhou, Ri-gui, Li, Yaochong
Raman spectroscopy serves as a powerful and reliable tool for analyzing the chemical information of substances. The integration of Raman spectroscopy with deep learning methods enables rapid qualitative and quantitative analysis of materials. Most existing approaches adopt supervised learning methods. Although supervised learning has achieved satisfactory accuracy in spectral analysis, it is still constrained by costly and limited well-annotated spectral datasets for training. When spectral annotation is challenging or the amount of annotated data is insufficient, the performance of supervised learning in spectral material identification declines. In order to address the challenge of feature extraction from unannotated spectra, we propose a self-supervised learning paradigm for Raman Spectroscopy based on a Masked AutoEncoder, termed SMAE. SMAE does not require any spectral annotations during pre-training. By randomly masking and then reconstructing the spectral information, the model learns essential spectral features. The reconstructed spectra exhibit certain denoising properties, improving the signal-to-noise ratio (SNR) by more than twofold. Utilizing the network weights obtained from masked pre-training, SMAE achieves clustering accuracy of over 80% for 30 classes of isolated bacteria in a pathogenic bacterial dataset, demonstrating significant improvements compared to classical unsupervised methods and other state-of-the-art deep clustering methods. After fine-tuning the network with a limited amount of annotated data, SMAE achieves an identification accuracy of 83.90% on the test set, presenting competitive performance against the supervised ResNet (83.40%).
- Asia > China > Shanghai > Shanghai (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
A Graph Based Raman Spectral Processing Technique for Exosome Classification
Ngo, Vuong M., Bolger, Edward, Goodwin, Stan, O'Sullivan, John, Cuong, Dinh Viet, Roantree, Mark
Exosomes are small vesicles crucial for cell signaling and disease biomarkers. Due to their complexity, an "omics" approach is preferable to individual biomarkers. While Raman spectroscopy is effective for exosome analysis, it requires high sample concentrations and has limited sensitivity to lipids and proteins. Surface-enhanced Raman spectroscopy helps overcome these challenges. In this study, we leverage Neo4j graph databases to organize 3,045 Raman spectra of exosomes, enhancing data generalization. To further refine spectral analysis, we introduce a novel spectral filtering process that integrates the PageRank Filter with optimal Dimensionality Reduction. This method improves feature selection, resulting in superior classification performance. Specifically, the Extra Trees model, using our spectral processing approach, achieves 0.76 and 0.857 accuracy in classifying hyperglycemic, hypoglycemic, and normal exosome samples based on Raman spectra and surface, respectively, with group 10-fold cross-validation. Our results show that graph-based spectral filtering combined with optimal dimensionality reduction significantly improves classification accuracy by reducing noise while preserving key biomarker signals. This novel framework enhances Raman-based exosome analysis, expanding its potential for biomedical applications, disease diagnostics, and biomarker discovery.
- Europe > Ireland (0.05)
- Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
SpecReX: Explainable AI for Raman Spectroscopy
Blake, Nathan, Kelly, David A., Chanchal, Akchunya, Kapllani-Mucaj, Sarah, Thomas, Geraint, Chockler, Hana
Raman spectroscopy is becoming more common for medical diagnostics with deep learning models being increasingly used to leverage its full potential. However, the opaque nature of such models and the sensitivity of medical diagnosis together with regulatory requirements necessitate the need for explainable AI tools. We introduce SpecReX, specifically adapted to explaining Raman spectra. SpecReX uses the theory of actual causality to rank causal responsibility in a spectrum, quantified by iteratively refining mutated versions of the spectrum and testing if it retains the original classification. The explanations provided by SpecReX take the form of a responsibility map, highlighting spectral regions most responsible for the model to make a correct classification. To assess the validity of SpecReX, we create increasingly complex simulated spectra, in which a "ground truth" signal is seeded, to train a classifier. We then obtain SpecReX explanations and compare the results with another explainability tool. By using simulated spectra we establish that SpecReX localizes to the known differences between classes, under a number of conditions. This provides a foundation on which we can find the spectral features which differentiate disease classes. This is an important first step in proving the validity of SpecReX.
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Monaco (0.04)
- Health & Medicine > Diagnostic Medicine (0.87)
- Health & Medicine > Therapeutic Area > Oncology (0.68)
QMe14S, A Comprehensive and Efficient Spectral Dataset for Small Organic Molecules
Yuan, Mingzhi, Zou, Zihan, Hu, Wei
Developing machine learning protocols for molecular simulations requires comprehensive and efficient datasets. Here we introduce the QMe14S dataset, comprising 186,102 small organic molecules featuring 14 elements (H, B, C, N, O, F, Al, Si, P, S, Cl, As, Se, Br) and 47 functional groups. Using density functional theory at the B3LYP/TZVP level, we optimized the geometries and calculated properties including energy, atomic charge, atomic force, dipole moment, quadrupole moment, polarizability, octupole moment, first hyperpolarizability, and Hessian. At the same level, we obtained the harmonic IR, Raman and NMR spectra. Furthermore, we conducted ab initio molecular dynamics simulations to generate dynamic configurations and extract nonequilibrium properties, including energy, forces, and Hessians. By leveraging our E(3)-equivariant message-passing neural network (DetaNet), we demonstrated that models trained on QMe14S outperform those trained on the previously developed QM9S dataset in simulating molecular spectra. The QMe14S dataset thus serves as a comprehensive benchmark for molecular simulations, offering valuable insights into structure-property relationships.
- North America > United States > Connecticut > New Haven County > Wallingford (0.04)
- Asia > China > Anhui Province > Hefei (0.04)